Exploring GLOTREC catalogue

Published

January 14, 2026

Gimena del Rio & Romina De León

(HDLAB CONICET)

Notebook designed and maintained by Romina De León

Goals:

  • Download data from the GLOTREC repository
  • Standardize and export the dataset for exploration and analysis
  • Clean and prepare GLOTREC data related to Argentine Textbooks
  • Exploring data and relationship between:
    • Authors and Publisher
    • Publisher and School Subjects
    • Publisher, Authors, School Subjects
  • Work on similar visualizations as the ones that can be found nowadays in GLOTREC, though improved with a focus on specific periods.

Libraries to use

Description:

  • tidyverse: for data importing, cleaning, transformation, wrangling, and tabular representation; includes dplyr, tidyr, readr, purrr, stringr, and ggplot2.

  • stringr / stringi: for text normalization, pattern detection, string manipulation, and removing diacritics.

  • ggplot2: for statistical and exploratory data visualization.

  • treemapify: for treemap visualizations integrated with ggplot2.

  • igraph: for analyzing network structures, computing centrality measures, clustering, and other graph-theoretic operations.

  • readr: for fast reading of CSV, TSV, and delimited files (also part of tidyverse).

  • maps: for base map datasets (useful fallback for quick geospatial outlines).

Installation of packages if not already installed, and call of necessary libraries

Code
#| include: false
# Only needed once to install packages
rm(list = ls())
required_packages <- c(
  "tidyverse",
  "tidytext",
  "treemapify",
  "dplyr",
  "ggplot2",
  "ggiraph",
  "purrr",
  "stringr",
  "readr",
  "readxl",
  "viridis",
  "plotly",
  "RColorBrewer",
  "htmltools"
)

packages_to_install <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]

if(length(packages_to_install) > 0) {
  cat("Instalando paquetes faltantes:", paste(packages_to_install, collapse = ", "), "\n")
  install.packages(packages_to_install, dependencies = TRUE)
} else {
  cat("Todos los paquetes necesarios ya están instalados.\n")
}
Todos los paquetes necesarios ya están instalados.
Code
invisible(lapply(required_packages, library, character.only = TRUE))
cat("✔ Todas las librerías fueron cargadas correctamente.\n")
✔ Todas las librerías fueron cargadas correctamente.

Read the downloaded Excel file and display first rows

Code
df <- read_excel("data/itbc_export_2025.xlsx", 
    col_types = c("text"))  %>%
      select(!starts_with("Unnamed")) %>%
      mutate(Year = as.numeric(Year))

glimpse(df) #show df information
Rows: 335
Columns: 16
$ ID                   <chr> "1026966108", "1026966620", "1026966876", "102697…
$ `Call Number`        <chr> "RA G-26(2,2016)1", "RA G-26(2,2016)4", "RA G-26(…
$ `GLOTREC|Cat Link`   <chr> "gei1026966108", "gei1026966620", "gei1026966876"…
$ Catalogue            <chr> "GEI", "GEI", "GEI", "GEI", "GEI", "GEI", "GEI", …
$ `Library Catalogue`  <chr> "PPN=1026966108", "PPN=1026966620", "PPN=10269668…
$ Year                 <dbl> 2013, 2013, 2013, 2008, 2006, 2005, 1998, 1998, 1…
$ Publisher            <chr> "A-Z editora", "A-Z editora", "A-Z editora", "A-Z…
$ Place                <chr> "Ciudad Autónoma de Buenos Aires", "Ciudad Autóno…
$ Title                <chr> "Geografía de la Argentina María Julia Echeverría…
$ Authors              <chr> "Echeverría, María Julia | Capuz, Silvia María", …
$ Pages                <chr> "276 Seiten Illustrationen, Diagramme, Karten", "…
$ Format               <chr> "Book", "Book", "Book", "Book", "Book", "Book", "…
$ `School Subject`     <chr> "Geography", "Geography", "Geography", "Geography…
$ `Level of Education` <chr> "ISCED 3 - Upper secondary level", "ISCED 3 - Upp…
$ `Document Type`      <chr> "Textbook", "Textbook", "Textbook", "Textbook", "…
$ `Country of Use`     <chr> "Argentina", "Argentina", "Argentina", "Argentina…
Code
df %>% sample_n(5) # show random rows
# A tibble: 5 × 16
  ID        `Call Number` `GLOTREC|Cat Link` Catalogue `Library Catalogue`  Year
  <chr>     <chr>         <chr>              <chr>     <chr>               <dbl>
1 645041645 RA H-47(1,20… gei645041645       GEI       PPN=645041645        2007
2 10274610… RA H-61(1,20… gei1027461085      GEI       PPN=1027461085       2010
3 654843724 RA RA-44(13,… gei654843724       GEI       PPN=654843724        1956
4 645041920 RA H-47(1,20… gei645041920       GEI       PPN=645041920        2005
5 654596069 RA RA-17(1,5… gei654596069       GEI       PPN=654596069        1956
# ℹ 10 more variables: Publisher <chr>, Place <chr>, Title <chr>,
#   Authors <chr>, Pages <chr>, Format <chr>, `School Subject` <chr>,
#   `Level of Education` <chr>, `Document Type` <chr>, `Country of Use` <chr>

Normalization of the Publisher column

  • Clean up spaces and convert to lowercase
  • Normalization publishers with a mapping dictionary
  • Apply normalization in Publisher column
Code
df <- df %>%
  mutate(
    Publisher = str_trim(Publisher) |> str_to_lower(),
    Publisher = case_when(
      Publisher %in% c("a-z editora", "a-z ed.", "az editora") ~ "A-Z Editora",
      Publisher %in% c("estrada", "estrada secundaria", "angel estrada & cía.s.a.-editores") ~ "Estrada",
      Publisher %in% c("puerto de palos s.a. casa de édiciones", "puerto de palos") ~ "Puerto de Palos",
      Publisher %in% c("aique primaria", "aique secundaria", "aique") ~ "Aique",
      Publisher %in% c("kapelusz", "ed. kapelusz", "kapelusz norma") ~ "Kapelusz",
      Publisher %in% c("tinta fresca") ~ "Tinta Fresca",
      Publisher %in% c("doce orcas ediciones", "doce orcas ed.", "doce orcas") ~ "Doce Orcas",
      Publisher %in% c("ed. stella") ~ "Stella",
      Publisher %in% c("ed. atlántida") ~ "Atlántida",
      Publisher %in% c("losada") ~ "Losada",
      Publisher %in% c("ed. troquel") ~ "Troquel",
      Publisher %in% c("imprenta mercur") ~ "Mercur",
      Publisher %in% c("imprenta de pablo e. coni, especial para obras", "coni") ~ "Coni",
      Publisher %in% c("igon", "igón") ~ "Igon",
      Publisher %in% c("goethe-inst.") ~ "Goethe-Institut",
      Publisher %in% c("cesarini", "cesarini hnos. ed.") ~ "Cesarini",
      Publisher %in% c("producciones mawis") ~ "Mawis",
      Publisher %in% c("editorial h.m.e.") ~ "HME",
      Publisher %in% c(
        "imprenta y librería de mayo"
      ) ~ "Librería de Mayo",
      Publisher %in% c(
        "librería del colegio, alsina y bolívar",
        "cabaut, librería del colegio",
        "alsina & bolívar, librería del colegio",
        "librería del colegio"
      ) ~ "Librería del Colegio",
      Publisher %in% c("ed. crespillo", "f. crespillo", "f. crespillo editor") ~ "Crespillo",
      Publisher %in% c("ed. peuser", "peuser") ~ "Peuser",
      TRUE ~ Publisher
    ),
    Publisher = str_to_title(Publisher)
  )

Create a function to normalize author names according to specified rules

1. Remove accents and extra spaces
2. If there's a comma, we assume "Last, First" format
3. If last name has multiple parts, keep them together
4. Select only the first given name
5. Rebuild normalized name
6. If no comma, just title case the whole name

Apply function to Authors column

Code
normalizar_autor <- function(nombre) {
  if (is.na(nombre) || !is.character(nombre) || str_trim(nombre) == "") {
    return(NA_character_)
    }

    # remove accents and extra spaces
    nombre <- nombre |>
      str_trim() |>
      #stri_trans_general ("Latin-ASCII") |>
      str_squish()

    # If there's a comma, we assume "Last, First" format
    if (str_detect(nombre, ",")) {
      partes <- str_split(nombre, ",", n = 2)[[1]]
      apellido <- partes[1] |> str_trim() |> str_squish()
      resto <- partes[2] |> str_trim()

      # select only the first given name
      primer_nombre <- if (resto == "") "" else str_split(resto, "\\s+")[[1]][1]

      # rebuild normalized name
      nombre_norm <- paste0(str_to_title(apellido), ", ", str_to_title(primer_nombre))
      } else {
      # if no comma, just title case the whole name
       nombre_norm <- str_to_title(nombre)
}
    return(str_trim(nombre_norm))
}

# Apply function to authors column ---

df <- df %>%
  mutate(
    Authors = as.character(Authors),
    Authors = Authors |> 
      replace_na("") |>                        # complete NA with empty string
      str_split("\\|") |>                      # separate  |
      purrr::map(~ .x[.x != ""]) |>                   # delete 
      purrr::map(~ map_chr(.x, normalizar_autor))     # apply function
  )

Graph Section

1. Graph Publishers by Number of Books

Code
a <- df %>% 
  count(Publisher, sort = TRUE) %>% 
  mutate(Publisher = stringr::str_wrap(Publisher, width = 30)) %>%
          slice_head(n = 25) %>%
  ggplot(aes(x = reorder(Publisher, n), 
                y = n,
                fill = n,
                tooltip = paste0(
                  Publisher, "<br>",
                  "Count: ", n),
                data_id = Publisher)) +
  geom_col_interactive(color = "white", linewidth = 0.1,
    show.legend = FALSE ) + #, na.rm = TRUE) +
  scale_fill_viridis_c_interactive(option = "viridis") +
  coord_flip() +
  labs(
    title = "Top 25 Publishers by Count",
    x = "Publisher",
    y = "Count"
  ) + 
  guides(fill = "none") +
  theme_light() +
  theme(
    axis.text.y = element_text(size = 8),
    panel.grid.minor = element_blank()
  )

htmltools::div(style = "width:100%; height:400px;",
  girafe(
  ggobj = a,
  width_svg = 10,
  height_svg = 6,
  options = list(
    opts_sizing(rescale = TRUE),
    opts_hover(css = "fill:orange;cursor:pointer;"))
))

2. Graph Publishers by Number of Books and Level of Education

Code
p1 <- df %>%
  group_by(`Level of Education`, `Document Type`, `Publisher`) %>%
  summarise(Books_Count = n(), .groups = "drop") %>%
  filter(Books_Count > 3) %>%
  mutate(Publisher = reorder(Publisher, Books_Count)) %>%
ggplot(aes(x = Books_Count, 
                         y = Publisher, 
                         fill = Publisher,
                         tooltip = paste0(Publisher, ": ", Books_Count),
                         data_id = Publisher)) +
  geom_col_interactive(show.legend = FALSE, color = "gray", size = 0.2) +
  facet_wrap(~`Level of Education`, ncol = 2, scales = "free") +
  scale_fill_brewer(palette = "Set3") + 
  labs(x = "Books Count", y = "Publisher") +
  theme_minimal() +
  theme(
    panel.grid.major.y = element_blank(), 
    panel.grid.minor = element_blank(),
    panel.border = element_rect(color = "lightgray", fill = NA),
    strip.text = element_text(face = "bold", size = 11),
    axis.text.y = element_text(size = 9),
    axis.text.x = element_text(size = 8),
    plot.background = element_rect(fill = "white", color = NA)
  )


htmltools::div(style = "width:100%; height:400px;",
      girafe(ggobj = p1, 
       options = list(
         opts_hover(css = "fill:orange;stroke:black;"), 
         opts_toolbar(saveaspng = FALSE)
       ),
       width_svg = 9, height_svg = 5.5))

3. Heatmap of Publishers vs School Subjects

Code
p2 <- df %>%
  separate_rows(Publisher, sep = ", ") %>% 
  filter(`School Subject` != "German taught in non-German-speaking countries") %>%
  count(`School Subject`, Publisher) %>%
  mutate(n_masked = ifelse(n <= 3, NA, n)) %>%
ggplot(aes(x = Publisher, y = `School Subject`, fill = n_masked)) +
  geom_tile_interactive(aes(
    tooltip = paste0("Subject: ", `School Subject`, "<br>",
                     "Publisher: ", Publisher, "<br>",
                     "Count: ", n),
    data_id = Publisher),
    color = "white", linewidth = 0.5,
    show.legend = FALSE ) +
  scale_fill_distiller(palette = "YlGnBu", direction = 1, na.value = "white", name = "Books Count") +
  labs(
    title = "Count of Books by Publisher and Subject",
    x = "Publisher",
    y = "School Subject"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5, margin = margin(b=20)),
    axis.text.x = element_text(angle = 60, hjust = 1, size = 10),
    axis.text.y = element_text(size = 10),
    panel.grid = element_blank())

htmltools::div(style = "width:100%; height:400px;",
girafe(
  ggobj = p2,
  width_svg = 12,
  height_svg = 7,
  options = list(
    opts_sizing(rescale = TRUE),
    opts_hover(css = "stroke:black;stroke-width:2px;")
  )
))

4. Histgram of books by Year and Publisher

Code
p3 <- df %>%
  drop_na(Year) %>%
  separate_rows(Publisher, sep = ", ") %>%
  filter(Publisher %in% (count(., Publisher, sort = TRUE) %>% 
    slice_head(n = 25) %>% 
    pull(Publisher)
  )) %>%
  ggplot(aes(x = Year, fill = Publisher, data_id = Publisher)) +
  geom_histogram_interactive(aes(tooltip = after_stat(paste0("Publisher: ",
                                                             fill, "<br>",
                                                             "Year: ", round(x), "<br>",
                                                             "Count: ", count))),
    bins = 40, 
    position = "stack", 
    color = "white", 
    linewidth = 0.1,
    show.legend = FALSE 
  ) +
  scale_fill_manual(values = c(brewer.pal(12, "Paired"), 
                               brewer.pal(8, "Dark2"), 
                               brewer.pal(5, "Set1"))) +
  labs(
    title = "Distribution of Books by Year and Publisher (before 1900)",
    x = "Year of Publication",
    y = "Book Count"
  ) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

htmltools::div(style = "width:100%; height:400px;",
girafe(
  ggobj = p3,
  width_svg = 12,
  height_svg = 7,
  options = list(
    opts_sizing(rescale = TRUE),
    opts_hover(css = "opacity:1;stroke:black;stroke-width:2px;"),
    opts_hover_inv(css = "opacity:0.3;") 
  )
))

5. Heatmap of Publishers vs Level of Education

Code
p4 <- df %>% 
  mutate(`Level of Education` = str_split_i(replace_na(`Level of Education`, ""), "\\|", 1)) %>% 
  separate_rows(Publisher, sep = ", ") %>% 
  count(`Level of Education`, Publisher) %>%
  filter(n >= 4) %>%
  ggplot(aes(x = Publisher, y = `Level of Education`, fill = n)) +
  geom_tile_interactive(aes(
    tooltip = paste0("Publisher: ", Publisher, "<br>",
                     "Level of Education: ", `Level of Education`, "<br>",
                     "Books Count: ", n),
    data_id = Publisher
    ),
    color = "white", linewidth = 0.5,
    show.legend = FALSE) + 
  scale_fill_distiller(palette = "YlGnBu", direction = 1, name = "Books Count") +
  labs(
    title = "Count of Books by Publisher and Level of Education",
    x = "Publisher",
    y = "Level of Education"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5, margin = margin(b=20)),
    axis.text.x = element_text(angle = 60, hjust = 1, size = 11),
    axis.text.y = element_text(size = 11),
    panel.grid = element_blank()
  )

htmltools::div(style = "width:100%; height:400px;",
girafe(
  ggobj = p4,
  width_svg = 14,
  height_svg = 8,
  options = list(
    opts_sizing(rescale = TRUE),
    opts_hover(css = "stroke:black;stroke-width:2px;")
  )
))

6. Heatmap of Publishers vs Document Type

Code
p5 <- df %>% 
  mutate(`Document Type` = str_split_i(replace_na(`Document Type`, ""), "\\|", 1)) %>% 
  separate_rows(Publisher, sep = ", ") %>% 
  count(`Document Type`, Publisher) %>%
  filter(n >= 4) %>%
  ggplot(aes(x = Publisher, y = `Document Type`, fill = n)) +
  geom_tile_interactive(aes(
    tooltip = paste0("Publisher: ", Publisher, "<br>",
                     "Document Type: ", `Document Type`, "<br>",
                     "Books Count: ", n),
    data_id = Publisher
    ),
    color = "white", linewidth = 0.5,
    show.legend = FALSE) + 
  scale_fill_distiller(palette = "YlGnBu", direction = 1, name = "Books Count") +
  labs(
    title = "Count of Books by Publisher and Document Type",
    x = "Publisher",
    y = "Document Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5, margin = margin(b=20)),
    axis.text.x = element_text(angle = 60, hjust = 1, size = 11),
    axis.text.y = element_text(size = 11),
    panel.grid = element_blank()
  )

htmltools::div(style = "width:100%; height:400px;",
girafe(
  ggobj = p5,
  width_svg = 14,
  height_svg = 8,
  options = list(
    opts_sizing(rescale = TRUE),
    opts_hover(css = "stroke:black;stroke-width:2px;")
  )
))

7. Collaboration Patterns between Publishing Houses and Researchers via Sankey Flow

Code
p6 <- df %>%
  unnest(Authors) %>%
  filter(!is.na(Authors), Authors != "") %>%
  count(Publisher, Authors, name = "value") %>%
  filter(value >= 2) %>%
  
  # Sankey
  {
    d <- .
    
    # Nodes
    nodes_names <- unique(c(d$Publisher, d$Authors))
    nodes <- data.frame(name = nodes_names)
    
    links <- d %>%
      mutate(
        source = match(Publisher, nodes$name) - 1,
        target = match(Authors, nodes$name) - 1
      )
    
    pubs_count <- length(unique(d$Publisher))
    auths_count <- length(unique(d$Authors))
    
    plot_ly(
      type = "sankey",
      orientation = "h",
      node = list(
        label = nodes$name,
        pad = 15,
        thickness = 15,
        line = list(color = "white", width = 0.5),
        color = c(rep("#4C72B0", pubs_count), rep("#9909A9", auths_count))
      ),
      link = list(
        source = links$source,
        target = links$target,
        value = links$value,
        color = "rgba(150,150,150,0.3)"
      )
    ) %>%
      layout(
        autosize = TRUE,
        margin = list(l = 50, r = 50, t = 80, b = 40),
        title = list(
          text = paste0("Flow of Publications between Publishers and Authors<br>",
                        "<sup>", pubs_count, " publishers — ", auths_count, " authors</sup>"),
          font = list(size = 14)
        ),
        font = list(size = 10),
        height = 800
      )
  } %>% 
  htmltools::div(style = "width:100%; height:800px;")

p6

8. Alluvial plot of Publishers vs Authors for decades

Code
p7 <- df %>%
  unnest(Authors) %>%  
  filter(Year >= 1860, Year < 1900) %>%
  count(Publisher, Authors, name = "count") %>%
  filter(Authors != "", !is.na(Authors)) %>%
  filter(count >= 1) %>%

  plot_ly(
    type = 'parcats',
    dimensions = list(
      list(label = 'Publisher', values = ~Publisher),
      list(label = 'Author', values = ~Authors)
    ),
    line = list(
      color = ~count,
      colorscale = 'Viridis',
      showscale = FALSE
    ),
    hoveron = 'dimension',
    hoverinfo = 'count+probability'
  ) %>%
  
  layout(
    title = list(
      text = "<b>Publisher–Author Collaborations (1860–1900)</b>",
      x = 0.5,
      xanchor = 'center',
      font = list(size = 14, family = 'Arial Black')
    ),
    font = list(size = 11, family = 'Arial'),
    paper_bgcolor = 'white',
    plot_bgcolor = 'white',
    margin = list(l = 80, r = 80, t = 100, b = 50),
    height = 700
  ) %>%

  htmltools::div(style = "width:100%; height:750px; overflow:hidden;")
p7
Code
p8 <- df %>%
  unnest(Authors) %>%  
  filter(Year >= 1901, Year < 1950) %>% 
  count(Publisher, Authors, name = "count") %>%
  filter(Authors != "", !is.na(Authors)) %>%

  plot_ly(
    type = 'parcats',
    dimensions = list(
      list(label = 'Publisher', values = ~Publisher),
      list(label = 'Author', values = ~Authors)
    ),
    line = list(
      color = ~count,
      colorscale = 'Viridis',
      showscale = FALSE,
      shape = 'hspline'
    ),
    hoveron = 'category',
    hoverinfo = 'count+probability+text'
  ) %>%
  
  layout(
    title = list(
      text = "<b>Publisher–Author Collaborations (1901–1950)</b>",
      x = 0.5,
      xanchor = 'center',
      font = list(size = 14, family = 'Arial Black')
    ),
    font = list(size = 11, family = 'Arial'),
    paper_bgcolor = 'white',
    plot_bgcolor = 'white',
    margin = list(l = 80, r = 80, t = 100, b = 50),
    height = 700
  ) %>%

  htmltools::div(style = "width:100%; height:750px; overflow:hidden;")
p8
Code
p9 <- df %>%
  unnest(Authors) %>%  
  filter(Year >= 1951, Year < 1980) %>% 
  count(Publisher, Authors, name = "count") %>%
  filter(count >= 2) %>%
  filter(Authors != "", !is.na(Authors)) %>%
 
  plot_ly(
    type = 'parcats',
    dimensions = list(
      list(label = 'Publisher', values = ~Publisher),
      list(label = 'Author', values = ~Authors)
    ),
    line = list(
      color = ~count,
      colorscale = 'Viridis',
      showscale = FALSE,
      shape = 'hspline'
    ),
    hoveron = 'category',
    hoverinfo = 'count+probability+text'
  ) %>%
  
  layout(
    title = list(
      text = "<b>Publisher–Author Collaborations (Two or more Publications, 1951–1980)</b>",
      x = 0.5,
      xanchor = 'center',
      font = list(size = 14, family = 'Arial Black')
    ),
    font = list(size = 11, family = 'Arial'),
    paper_bgcolor = 'white',
    plot_bgcolor = 'white',
    margin = list(l = 80, r = 80, t = 100, b = 50),
    height = 800
  ) %>%

  htmltools::div(style = "width:100%; height:850px; overflow:hidden;")
p9
Code
p10 <- df %>%
  unnest(Authors) %>%  
  filter(Year >= 1981, Year < 2000) %>% 
  count(Publisher, Authors, name = "count") %>%
  filter(count >= 2) %>%
  filter(Authors != "", !is.na(Authors)) %>%

  plot_ly(
    type = 'parcats',
    dimensions = list(
      list(label = 'Publisher', values = ~Publisher),
      list(label = 'Author', values = ~Authors)
    ),
    line = list(
      color = ~count,
      colorscale = 'Viridis',
      showscale = FALSE,
      shape = 'hspline'
    ),
    hoveron = 'category',
    hoverinfo = 'count+probability+text'
  ) %>%
  
  layout(
    title = list(
      text = "<b>Publisher–Author Collaborations (Two or more Publications, 1981–2000)</b>",
      x = 0.5,
      xanchor = 'center',
      font = list(size = 14, family = 'Arial Black')
    ),
    font = list(size = 11, family = 'Arial'),
    paper_bgcolor = 'white',
    plot_bgcolor = 'white',
    margin = list(l = 80, r = 80, t = 100, b = 50),
    height = 800
  ) %>%

  htmltools::div(style = "width:100%; height:850px; overflow:hidden;")
p10
Code
p11 <- df %>%
  unnest(Authors) %>%  
  filter(Year >= 2001, Year < 2010) %>% 
  count(Publisher, Authors, name = "count") %>%
  filter(count >= 2) %>%
  filter(Authors != "", !is.na(Authors)) %>%

  plot_ly(
    type = 'parcats',
    dimensions = list(
      list(label = 'Publisher', values = ~Publisher ),
      list(label = 'Author', values = ~Authors)
    ),
    line = list(
      color = ~count,
      colorscale = 'Viridis',
      showscale = FALSE,
      shape = 'hspline'
    ),
    hoveron = 'category',
    hoverinfo = 'count+probability+text'
  ) %>%
  
  layout(
    title = list(
      text = "<b>Publisher–Author Collaborations (Two or more Publications, 2001–2010)</b>",
      x = 0.5,
      xanchor = 'center',
      font = list(size = 14, family = 'Arial Black')
    ),
    font = list(size = 11, family = 'Arial'),
    paper_bgcolor = 'white',
    plot_bgcolor = 'white',
    margin = list(l = 80, r = 80, t = 100, b = 50),
    height = 800
  ) %>%

  htmltools::div(style = "width:100%; height:850px;")
p11
Code
p12 <- df %>%
  unnest(Authors) %>%  
  filter(Year >= 2011) %>% 
  count(Publisher, Authors, name = "count") %>%
  filter(count >= 2) %>%
  filter(Authors != "", !is.na(Authors)) %>%

  plot_ly(
    type = 'parcats',
    dimensions = list(
      list(label = 'Publisher', values = ~Publisher ),
      list(label = 'Author', values = ~Authors)
    ),
    line = list(
      color = ~count,
      colorscale = 'Viridis',
      showscale = FALSE,
      shape = 'hspline'
    ),
    hoveron = 'category',
    hoverinfo = 'count+probability+text'
  ) %>%
  
  layout(
    title = list(
      text = "<b>Publisher–Author Collaborations (Two or more Publications, since 2011)</b>",
      x = 0.5,
      xanchor = 'center',
      font = list(size = 14, family = 'Arial Black')
    ),
    font = list(size = 11, family = 'Arial'),
    paper_bgcolor = 'white',
    plot_bgcolor = 'white',
    margin = list(l = 80, r = 80, t = 100, b = 50),
    height = 800
  ) %>%

  htmltools::div(style = "width:100%; height:850px;")
p12

9. Relationship between School Subjects and Authors

Code
p13 <-  df %>%
  unnest(Authors) %>% 
  filter(`School Subject` != "German taught in non-German-speaking countries") %>% 
  count(`School Subject`, Authors, name = "count") %>%
  filter(count >= 3) %>%
  filter(Authors != "", !is.na(Authors)) %>%
  mutate(n_masked = ifelse(count <= 3, NA, count)) %>% 
  
ggplot(aes(x = Authors, y = `School Subject`, fill = n_masked)) +
  geom_tile_interactive(aes(
    tooltip = paste0("School Subject: ", `School Subject`, "<br>",
                     "Authors: ", Authors, "<br>",
                     "Count: ", count),
    data_id = Authors),
    color = "white", linewidth = 0.5,
    show.legend = FALSE ) +
  scale_fill_distiller(palette = "YlGnBu", direction = 1, na.value = "white", name = "Books Count") +
  labs(
    title = "Count of Books by Authors and School Subject (authors with four or more books)",
    x = "Authors",
    y = "School Subject"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5, margin = margin(b=20)),
    axis.text.x = element_text(angle = 60, hjust = 1, size = 10),
    axis.text.y = element_text(size = 10),
    panel.grid = element_blank())

htmltools::div(style = "width:100%; height:400px;",
girafe(
  ggobj = p13,
  width_svg = 12,
  height_svg = 7,
  options = list(
    opts_sizing(rescale = TRUE),
    opts_hover(css = "stroke:black;stroke-width:2px;")
  )
))

10. Time series of book counts by authors

Code
p14 <- df %>%
  unnest(Authors) %>%
  filter(Year >= 1860 & Year <= 1950) %>%
  add_count(Authors, `School Subject`) %>%
  filter(n >= 2) %>%
  
  
  ggplot(aes(x = Year, 
             y = reorder(Authors, Year), 
             color = `School Subject`)) +
  geom_point_interactive(aes(
    tooltip = paste0("Author: ", Authors, "\n",
                    "Year: ", Year, "\n",
                    "Total in ", `School Subject`, ": ", n),
    data_id = Authors 
    
  ), size = 4, alpha = 0.7) +
  
  theme_minimal() +
  labs(title = "Publication timeline by author (1860-1950)",
       subtitle = "Includes authors with at least two publications",
       x = "Year", y = "Author") +
  theme(axis.text.y = element_text(size = 10))

htmltools::div(style = "width:100%; height:400px;",
girafe(ggobj = p14, 
       options = list(
         opts_hover(css = "fill:orange;stroke:black;cursor:pointer;"),
         opts_hover_inv(css = "opacity:0.2;"),
         opts_toolbar(saveaspng = TRUE)
       ),
       width_svg = 10, height_svg = 8))
Code
p15 <- df %>%
  unnest(Authors) %>%
  filter(Year >= 1951 & Year <= 1980) %>%
  add_count(Authors, `School Subject`) %>%
  filter(n >= 2) %>%
  
  
  ggplot(aes(x = Year, 
             y = reorder(Authors, Year), 
             color = `School Subject`)) +
  geom_point_interactive(aes(
    tooltip = paste0("Author: ", Authors, "\n",
                    "Year: ", Year, "\n",
                    "Total in ", `School Subject`, ": ", n),
    data_id = Authors 
    
  ), size = 4, alpha = 0.7) +
  
  theme_minimal() +
  labs(title = "Publication timeline by author (1951-1980)",
       subtitle = "Includes authors with at least two publications",
       x = "Year", y = "Author") +
  theme(axis.text.y = element_text(size = 10))

htmltools::div(style = "width:100%; height:400px;",
girafe(ggobj = p15, 
       options = list(
         opts_hover(css = "fill:orange;stroke:black;cursor:pointer;"),
         opts_hover_inv(css = "opacity:0.2;"),
         opts_toolbar(saveaspng = TRUE)
       ),
       width_svg = 10, height_svg = 8))
Code
p20 <- df %>%
  unnest(Authors) %>%
  filter(Year >= 1981 & Year <= 2000) %>%
  add_count(Authors, `School Subject`) %>%
  filter(n >= 2) %>%
  
  
  ggplot(aes(x = Year, 
             y = reorder(Authors, Year), 
             color = `School Subject`)) +
  geom_point_interactive(aes(
    tooltip = paste0("Author: ", Authors, "\n",
                    "Year: ", Year, "\n",
                    "Total in ", `School Subject`, ": ", n),
    data_id = Authors 
    
  ), size = 4, alpha = 0.7) +
  
  theme_minimal() +
  labs(title = "Publication timeline by author (1981-2000)",
       subtitle = "Includes authors with at least two publications",
       x = "Year", y = "Author") +
  theme(axis.text.y = element_text(size = 10))

htmltools::div(style = "width:100%; height:400px;",
girafe(ggobj = p20, 
       options = list(
         opts_hover(css = "fill:orange;stroke:black;cursor:pointer;"),
         opts_hover_inv(css = "opacity:0.2;"),
         opts_toolbar(saveaspng = TRUE)
       ),
       width_svg = 10, height_svg = 8))
Code
p19 <- df %>%
  unnest(Authors) %>%
  filter(Year >= 2001 & Year <= 2011) %>%
  add_count(Authors, `School Subject`) %>%
  filter(n >= 2) %>%
  
  
  ggplot(aes(x = Year, 
             y = reorder(Authors, Year), 
             color = `School Subject`)) +
  geom_point_interactive(aes(
    tooltip = paste0("Author: ", Authors, "\n",
                    "Year: ", Year, "\n",
                    "Total in ", `School Subject`, ": ", n),
    data_id = Authors 
    
  ), size = 4, alpha = 0.7) +
  scale_x_continuous(breaks = seq(2001, 2011, by = 1)) +
  theme_minimal() +
  labs(title = "Publication timeline by author (2001-2011)",
       subtitle = "Includes authors with at least two publications",
       x = "Year", y = "Author") +
  theme(axis.text.y = element_text(size = 10))

htmltools::div(style = "width:100%; height:400px;",
girafe(ggobj = p19, 
       options = list(
         opts_hover(css = "fill:orange;stroke:black;cursor:pointer;"),
         opts_hover_inv(css = "opacity:0.2;"),
         opts_toolbar(saveaspng = TRUE)
       ),
       width_svg = 10, height_svg = 8))
Code
p18 <- df %>%
  unnest(Authors) %>%
  filter(Year >= 2012) %>%
  add_count(Authors, `School Subject`) %>%
  filter(n >= 2) %>%
  ggplot(aes(x = Year, 
             y = reorder(Authors, Year), 
             color = `School Subject`)) +
  geom_point_interactive(aes(
    tooltip = paste0("Author: ", Authors, "\n",
                    "Year: ", Year, "\n",
                    "Total in ", `School Subject`, ": ", n),
    data_id = Authors 
  ), size = 4, alpha = 0.7) +
  #scale_x_continuous(breaks = seq(2012, , by = 1)) + 
  theme_minimal() +
  labs(title = "Publication timeline by author (Since 2012)",
       subtitle = "Includes authors with at least two publications",
       x = "Year", y = "Author") +
  theme(axis.text.y = element_text(size = 10))
 htmltools::div(style = "width:100%; height:400px;",

girafe(ggobj = p18, 
       options = list(
         opts_hover(css = "fill:orange;stroke:black;cursor:pointer;"),
         opts_hover_inv(css = "opacity:0.2;"),
         opts_toolbar(saveaspng = TRUE)
       ),
       width_svg = 10, height_svg = 8))